07. Quiz: Q-Learning
Quiz: Q-Learning
Say that an agent is learning to navigate the gridworld described earlier in the lesson.
data:image/s3,"s3://crabby-images/72b0b/72b0b6e804593c583797a29f961a51ba00678561" alt="Gridworld Example"
Gridworld Example
Suppose the agent is using Q-Learning in its search for the optimal policy, with \alpha=0.1.
At the end of the 99th episode, the Q-table has the following values:
data:image/s3,"s3://crabby-images/d70f2/d70f2d1eee5585f27e5aace2103ccc1a5702a36a" alt="Q-table"
Q-table
Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right. As a result, it receives reward -1, and the next state is state 2.
data:image/s3,"s3://crabby-images/d29f4/d29f42805206af72b216b7e8f4da6274e2df0b8e" alt="Beginning of the 100th episode"
Beginning of the 100th episode
In the previous video, you learned that at this point in time, the agent updates the Q-table.